Optimal Bayesian 2D-Discretization for Variable Ranking in Regression
نویسندگان
چکیده
In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non parametric discretization method for the classification case. For that, we introduce a family of discretization grid models and a prior distribution defined on this model space. For this prior, we then derive the exact Bayesian model selection criterion. The obtained most probable grid-partition of the data emphasizes the relation (or the absence of relation) between inputs and output and provides a ranking criterion for the input variables. Preliminary experiments both on synthetic and real data demonstrate the criterion capacity to select the most relevant variables and to improve a regression tree.
منابع مشابه
VARIATIONAL DISCRETIZATION AND MIXED METHODS FOR SEMILINEAR PARABOLIC OPTIMAL CONTROL PROBLEMS WITH INTEGRAL CONSTRAINT
The aim of this work is to investigate the variational discretization and mixed finite element methods for optimal control problem governed by semi linear parabolic equations with integral constraint. The state and co-state are approximated by the lowest order Raviart-Thomas mixed finite element spaces and the control is not discreted. Optimal error estimates in L2 are established for the state...
متن کاملبررسی عوامل مؤثر بر تورم در ایران مبتنی بر رویکرد میانگینگیری بیزینی (BMA) و میانگینگیری حداقل مربعات (WALS)
In this research to show how 14 variables affect inflation in period 1974-2007, Bayesian model averaging and weighted average least square methods has been used. And also by using Vselect program optimal model for every independent variable has been identified. Results show that price index growth of imported goods is the main factor for inflation in Iran economic. In ranking this 14 factors –t...
متن کاملBayesian Quantile Regression with Adaptive Lasso Penalty for Dynamic Panel Data
Dynamic panel data models include the important part of medicine, social and economic studies. Existence of the lagged dependent variable as an explanatory variable is a sensible trait of these models. The estimation problem of these models arises from the correlation between the lagged depended variable and the current disturbance. Recently, quantile regression to analyze dynamic pa...
متن کاملInflation Behavior in Top Sukuk Issuing Countries: Using a Bayesian Log-linear Model
This paper focused on developing a model to study the effect of sukuk issuance on the inflation rate in top sukuk issuing Islamic economies at 2014. For this purpose, as the available sample size is small, a Bayesian approach to regression model is used which contains key supply and demand side factors in addition to the outstanding sukuk volume as potential determinants of inflation rate...
متن کاملRegression Methods Applied to Flight Variables for Situational Awareness Estimation Using Dynamic Bayesian Networks
Situational awareness can be a valuable indicator of the performance of flight crews and the way pilots manage navigation information can be relevant to its estimation. In this research, dynamic Bayesian networks are applied to a dataset of variables both collected in real time during simulated flights and added with expert knowledge. This paper compares different approaches to the discretizati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006